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Abstract 

UCT, a state-of-the art algorithm for Monte Carlo tree search 
(MCTS) in games and Markov decision processes, is based 
on UCB, a sampling policy for the Multi-armed Bandit prob- 
lem (MAB) that minimizes the cumulative regret. However, 
search differs from MAB in that in MCTS it is usually only 
the final "arm pull" (the actual move selection) that collects a 
reward, rather than all "arm pulls". Therefore, it makes more 
sense to minimize the simple regret, as opposed to the cu- 
mulative regret. We begin by introducing policies for multi- 
armed bandits with lower finite-time and asymptotic simple 
regret than UCB, using it to develop a two-stage scheme 
(SR+CR) for MCTS which outperforms UCT empirically. 
Optimizing the sampling process is itself a metareasoning 
problem, a solution of which can use value of information 
(VOI) techniques. Although the theory of VOI for search ex- 
ists, applying it to MCTS is non-trivial, as typical myopic 
assumptions fail. Lacking a complete working VOI theory 
for MCTS, we nevertheless propose a sampling scheme that 
is "aware" of VOI, achieving an algorithm that in empirical 
evaluation outperforms both UCT and the other proposed al- 
gorithms. 

Introduction 

Monte-Carlo tree search, and especially a version based on 
the UCT formula (Kocsis and Szepesvari 2006 1 appears 
in numerous search applications, such as ( Gelly and Wang 



2006] |Eyerich, Keller, and Helmert 2010| . Although these 
methods are shown to be successful empirically, most au- 
thors appear to be using the UCT formula "because it has 
been shown to be successful in the past", and "because it 
does a good job of trading off exploration and exploita- 
tion". While the latter statement may be correct for the 
multi-armed bandit and for the UCB method ( |Auer, Cesa-| 
|Bianchi, and Fischer 2002| l, we argue that it is inappropri- 
ate for search. The problem is not that UCT does not work; 
rather, a simple reconsideration from basic principles can re- 
sult in schemes that outperform UCT. 

The core issue is that in adversarial search and search in 
"games against nature" — optimizing behavior under un- 
certainty, the goal is typically to either find a good (or op- 
timal) strategy, or even just to find the best first action of 



such a policy. Once such an action is discovered, it is usually 
not beneficial to further sample that action, "exploitation" 
is thus meaningless for search problems. Finding a good 
first action is closer to the pure exploration variant, as seen 
in the selection problem (Bubeck, Mu nos, and Stoltz 201 1| 



Tolpi rTand Shimony 2012| l. In the selection problem, it is 
much better to minimize the simple regret. However, the 
simple and the cumulative regret cannot be minimized si- 
multaneously; moreover, (Bubeck, Munos, and Stoltz 2011 1 
shows that in many cases the smaller the cumulative regret, 
the greater the simple regret. 

We begin with background definitions and related work. 
Some sampling schemes are introduced, and shown to have 
better bounds for the simple regret on sets than UCB, the first 
contribution of this paper. The results are applied to sam- 
pling in trees by combining the proposed sampling schemes 
on the first step of a rollout with UCT for the rest of the 
rollout. An additional sampling scheme based on metarea- 
soning principles is also suggested, another contribution of 
this paper. Finally, the performance of the proposed sam- 
pling schemes is evaluated on sets of Bernoulli arms, in ran- 
domly generated 2-level trees, and on the sailing domain, 
showing where the proposed schemes have improved per- 
formance. 

Background and Related Work 

Monte-Carlo tree search was initially suggested as a scheme 
for finding approximately optimal policies for Markov De- 
cision Processes (MDP). An MDP is defined by the set of 
states S, the set of actions A (also called moves in this pa- 
per), the transition distribution T(s, a, s'), the reward func- 
tion R(s, a, s'), the ini tial state s and an optio nal goal 
state t: {S, A, T, R, s, t) ( [Russell and Norvig 2003) . Several 
MCTS schemes explore an MDP by performing rollouts — 
trajectories from the current state to a state in which a termi- 
nation condition is satisfied (either the goal state, or a cutoff 
state for which the reward is evaluated approximately). 

Multi-armed bandits and UCT 
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In the Multi-armed Bandit problem (Vermorel and Mohri 
2005| > we have a set of K arms (see Figure [l]a). Each arm 
can be pulled multiple times. Sometimes a cost is associ- 
ated with each pulling action. When the ith arm is pulled, 



a random reward JQ from an unknown stationary distribu- 
tion is encountered. The reward is usually bounded between 
and 1. In the cumulative setting (the focus of much of 
the research literature on Multi-armed bandits), all encoun- 
tered rewards are collected by the agent. The UCB scheme 
was shown to be near-optimal in this respect (Auer, Cesa- 
Bian chi, and F ischer 2002|: 

Definition 1. Scheme UCB(c) pulls arm i that maximizes 
upper confidence bound bi on the reward: 



bi=Xi + 



'clog(rt) 



(1) 



where Xi is the average sample reward obtained from arm 
i, Hi is the number of times arm i was pulled, and n is the 
total number of pulls so far. 

The UCT algorithm, an extension of UCB to Monte-Carlo 
Tree Search is described in (Kocsis and Szepesvari 2006), 
and shown to outperform many state of the art search algo- 
rithms in both MDP and adversarial games ( Eyerich, Keller, 
|and Helmert 20T0l|Gelly and Wang 2006] , 

In the simple regret (selection) setting, the agent gets to 
collect only the reward of the last pull. 
Definition 2. The simple regret Er of a sampling policy for 
the Multi-armed Bandit Problem is the expected difference 
between the best true expected reward fi* and the true ex- 
pected reward fij of the arm with the greatest sample mean, 
j = argmaxi .XV 



K 



Er 



~ Aj Pi(j = argmaxXj 



(2) 



where Aj = /!* — pj. 

Strategies that minimize the simple regret are called pure 
exploration strategies (Bubeck, Munos, and Stoltz 2011 



An upper bound on the simple regret of uniform sam- 
pling is exponentially decreasing in the number of samples 
(see (Bubeck, Munos, and Stoltz 2011 1, Proposition 1). For 



UCB(c) the best known respective upper bound on the sim- 
ple regret of UCB(c) is only polynomially decreasing in the 



number of samples (see (Bubeck, Munos, and Stoltz 2011 



Theorems 2,3). However, empirically UCB(c) appears to 
yield a lower simple regret than uniform sampling. 

Metareasoning 

A completely different scheme for control of sampling can 
use the principles of bounded rationality (|Horvitz 1987) and 
metareasoning — ([Russell and Wefald 1991 1 provided a for- 
mal description of rational metareasoning and case studies 
of applications in several problem domains. In search, under 
myopic and sub-tree independence assumptions, one main- 
tains a current best move a at the root, and finds the ex- 
pected gain from finding another move f3 to be better than 
the current best (Russell 'and Wefald 1991| l. The "cost" of 
search actions can also be factored in. Ideally, an "optimal" 
sampling scheme, to be used for selecting what to sample, 
both at the root node ( Hay and Russell 201 1} and elsewhere, 



can be developed using metareasoning. However, this task is 
daunting for the following reasons: 



• The method is in general intractable, necessitating simpli- 
fying assumptions. However, using the standard metarea- 
soning myopic assumption, where samples would be se- 
lected as though at most one sample can be taken before 
an action is chosen, we run into serious problems. Even 
the basic selection problem (Tolpin and Shimony 20121 
exhibits a non-concave utility function and results in pre- 
mature stopping of the standard myopic algorithms. This 
is due to the fact that the value of information of a sin- 
gle measurement (analogous to a sample in MCTS) is fre- 
quently less than its time-cost, even though this is not true 
for multiple measurements. 

When applying the selection problem to MCTS, the sit- 
uation is exacerbated. The utility of an action is usually 
bounded, and thus in many cases a single sample may be 
insufficient to change the current best action, regardless 
of its outcome. As a result, we frequently get a zero "my- 
opic" value of information for a single sample. 

• Rational metareasoning requires a known distribution 
model, which may be difficult to obtain. 

• Defining the time-cost of a sample is not trivial. 

As the above ultimate goal is extremely difficult to 
achieve, we introduce in this paper simple schemes more 
amenable to analysis, loosely based on the metareasoning 
concept of value of information, and compare them to UCB 
(on sets) and UCT (in trees). 

Sampling Based on Simple Regret 
Analysis of Sampling on Sets 

We examine two sampling schemes with super- 
polynomially decreasing upper bounds on the simple 
regret. The bounds suggest that these schemes achieve a 
lower simple regret than uniform sampling; indeed, this is 
confirmed by experiments. 

We first consider e-greedy sampling as a straightforward 
generalization of uniform sampling: 

Definition 3. The e-greedy sampling scheme pulls the arm 
that currently has the greatst sample mean, with probability 
< e < 1, and any other arm with probability ' ~ £ . . 

This sampling scheme exhibits an exponentially decreas- 
ing simple regret: 

Theorem 1. For every < rj < 1 and 7 > 1 there exists N 
such that for any number of samples n > N the simple re- 
gret of the e-greedy sampling scheme is bounded from above 



( 



K 



Er 



e-greedy 



< 27 exp 



(K-l)e 
1-e 



(3) 



with probability at least 1 — 



Proof outline: Bound the probability Pi that a non-optimal 
arm i is selected. Split the interval [pi, p*] at ^ + Si. Apply 



the Chernoff-Hoeffding bound to get: 

Pi < Pr[X 4 > (M + Si] + Pr[X* < - (A* - 8$ 
< exp(-2^n l ) +exp(-2(A 4 -^) 2 n,) (4) 



Observe that, in probability, Xi — > ^ as n —> oo, therefore 
n* — » ne, rij — » as n — > oo. Conclude that for every 

< 77 < 1, 7 > 1 there exists N such that for every n > N 
and all non-optimal arms i: 



Pi < 7 ( exp 
Require 



~2Sfn(l e) 
K - 1 



\ + cxp(-2(A. 1 -<5 l ) 2 n e )^ 
(5) 



exp 



25?n(l - e) 
K-l 



exp (-2(Aj - 5i) 2 ne) 



(K-l)e 



Substitute ^ together with (|6]l into |2]) and obtain 

( \ 



(6) 
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s-greedy 
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exp 
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/ 



(7) 



□ 



In particular, as the number of arms K grows, the bound 
for ^-greedy sampling (e = |) becomes considerably 
tighter than for uniform random sampling (e = jf): 

Corollary 1. For uniform random sampling, 



A 



< 27 Ai exp ( - 
1 



K 



(8) 



For \- greedy sampling, 



-greedy — 



K ( 

27 ^2 A * ex p — 



-2A 2 n 



(9) 



A' 



27 X! A * exp 



A" 



/or K > 1 



e-greedy is based solely on sampling the arm that has the 
greatest sample mean (henceforth called the "current best" 
arm) with a higher probability then the rest of the arms, and 
ignores information about sample means of other arms. On 
the other hand, UCB distributes samples in accordance with 
sample means, but, in order to minimize cumulative regret, 
chooses the current best arm too often. Intuitively, a bet- 
ter scheme for simple regret minimization would distribute 
samples in a way similar to UCB, but would sample the cur- 
rent best arm less often. This can be achieved by replacing 
log(-) in Equation [T] with a faster growing sublinear func- 
tion, for example, 




Definition 4. Scheme UCBy:(c) pulls arm i that maxi- 
mizes hi, where: 



(10) 



where, as before, Xi is the average reward obtained from 
arm i, rij is the number of times arm i was pulled, and n is 
the total number of pulls so far. 

This scheme also exhibits a super-polynomially decreas- 
ing simple regret: 

Theorem 2. For every < rj < 1 and 7 > 1 there exists 
N such that for any number of samples n > N the simple 
regret of the UCB^j-.{ c) sampling scheme is bounded from 
above as 



K 

Kr ucbV : <2jJ2 A i ex P 



with probability at least 1 — 77. 



2 



(11) 



Proof outline: Bound the probability P, that a non-optimal 
arm i is chosen. Split the interval \pi, /z*] at + Apply 
the Chernoff-Hoeffding bound to get: 

Aii . „ a, 

2 

2 

C\fn 

rij < n, as n 



P < Pr 



Xi > [i t 



Pr 



X* < — 



< 



exp 



exp 



(12) 



Observe that, in probability, x „ 

Conclude that for every < 77 < 1, 7 > 1 there exists iV 
such that for every n > N and all non-optimal arms i: 



(13) 



Pi < 2 7 exp ^ * 

Substitute ( fT3j ) into |2) and obtain 

A 
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Sampling in Trees 

As mentioned above, UCT (Kocsis and Szepesvari 2006) is 
an extension of UCB for MCTS, that applies UCB(c) at each 
step of a rollout. At the root node, the sampling in MCTS is 
usually aimed at finding the first move to perform. Search 
is re-started, either from scratch or using some previously 
collected information, after observing the actual outcome (in 
MDPs) or the opponent's move (in adversarial games). Once 
one move is shown to be the best choice with high confi- 
dence, the value of information of additional samples of the 
best move (or, in fact, of any other samples) is low. There- 
fore, one should be able to do better than UCT by optimizing 
simple regret, rather than cumulative regret, at the root node. 

Nodes deeper in the search tree are a different matter. 
In order to support an optimal move choice at the root, it 



is beneficial in many cases to find a more precise estimate 
of the value of the state in these search tree nodes. For 
these internal nodes, optimizing simple regret is not the an- 
swer, and cumulative regret optimization is not so far off 
the mark. Lacking a complete metareasoning for sampling, 
which would indicate the optimal way to sample both root 
nodes and internal nodes, our suggested improvement to 
UCT thus combines different sampling schemes on the first 
step and during the rest of each rollout: 

Definition 5. The SR+CR MCTS sampling scheme selects 
an action at the current root node according to a scheme 
suitable for minimizing the simple regret (SR), such as |- 
greedy or UCB^:, and (at non-root nodes) then selects ac- 
tions according to UCB, which approximately minimizes the 
cumulative regret (CR). 

The pseudocode of this two-stage rollout for an undis- 
counted MDP is in Algorithm [T] FirstAction selects the 
first step of a rollout (line [5]), and NextAction (line [6]i 
selects steps during the rest of the rollout (usually using 
UCB). The reward statistic for the selected action is updated 



(line 10 1, and the sample reward is back-propagated (line 1 1 
towards the current root. 

We denote such two-step realizations of SR+CR as 
Alg+UCT, where Alg is the sampling scheme employed at 
the first step of a rollout (e.g. ^-greedy+UCT). 

Algorithm 1 Two-stage Monte-Carlo tree search sampling 

1: procedure RoLLOUT(node, depth=l) 

2: if IsLEAF(node, depth) then 

3: return 

4: else 

5: if depth=l then action f- FlRSTACTlON(node) 

6: else action f- NEXTAcTiON(node) 

7: next-node +- NEXTSTATE(node, action) 

8: reward +- REWARD(node, action, next-node) 

9: + ROLLOUT(next-node, depth+1) 

10: Updates TATS(node, action, reward) 

11: return reward 

12: end if 

13: end procedure 

We expect such two-stage sampling schemes to outper- 
form UCT and be significantly less sensitive to the tun- 
ing of the exploration factor c of UCB(c). That is since 
the contradiction between the need for a larger value of c 
on the first step (simple regret) and a smaller value for the 
rest of the rollout (cumulative regret) ( Bubeck, Muno s71md| 
Stoltz 2011) is resolved. In fact, a sampling scheme that uses 
UCB(c) at all steps but a larger value of c for the first step 
than for the rest of the steps, should also outperform UCT. 

VOI-aware Sampling 

Further improvement can be achieved by computing or es- 
timating the value of information (VOI) of the rollouts and 
choosing rollouts that maximize the VOI. However, as indi- 
cated above, actually computing the VOI is infeasible. In- 
stead we suggest the following scheme based on the follow- 
ing features of value of information: 



1. An estimate of the probability that one or more rollouts 
will make another action appear better than the current 
best a. 

2. An estimate of the gain that may be incurred if such a 
change occurs. 

If the distribution of results generated by the rollouts were 
known, the above features could be easily computed. How- 
ever, this is not the case for most MCTS applications. There- 
fore, we estimate bounds on the feature values from the cur- 
rent set of samples, based on the myopic assumption that the 
algorithm will only sample one of the actions, and use these 
bounds as the feature values, to get: 



VOI a 

VOIi 
where 



^j- exp (-2(X a - X 



n, 
1-X 



p) 2 n a ) 



(15) 



II; 



1 



exp (~2(X a - Xi) 2 rii) , i^a 



a = argmax Ji, /? = arg max Xi 



with VOI a being the (approximate) value for sampling the 
current best action, and VOIi is the (approximate) value for 
sampling some other action i. 

These equations were derived as follows. The gain from 
switching from the current best action a to another action 
can be bounded: by the current expectation of the value 
the current second-best action for the case where we sam- 
ple only a, and by 1 (the maximum reward) minus the cur- 
rent expectation of a when sampling any other action. The 
probability that another action be found best can be bounded 
by an exponential function of the difference in expectations 
when the true value of the actions becomes known. But the 
effect of each individual sample on the sample mean is in- 
versely proportional to the current number of samples, hence 
the current number of samples (plus one in order to handle 
the initial case of no previous samples) in the denominator. 

These VOI estimates are used in the "VOI-aware" sam- 
pling scheme as follows: sample the action that has maxi- 
mum estimated VOI. We judged these estimates to be too 
crude to be used as "stopping criteria" that can be used to 
cut off sampling, leaving this issue for future research. Al- 
though this scheme appears too complicated to be amenable 
to a formal analysis, early experiments (Section |i with this 
approach demonstrate a significantly lower simple regret. 

Empirical Evaluation 

The results were empirically verified on Multi-armed Ban- 
dit instances, on search trees, and on the sailing domain, as 
defined in (Kocsis and Szepesvari 2006). In most cases, the 
experiments showed a lower average simple regret for |- 
greedy an UCB^ than for UCB on sets, and for the SR+CR 
scheme than for UCT in trees. 

Simple regret in multi-armed bandits 

Figure[T|presents a comparison of MCTS sampling schemes 
on Multi-armed bandits. Figure [T]a shows the search tree 
corresponding to a problem instance. Each arm returns a ran- 
dom reward drawn from a Bernoulli distribution. The search 



selects an arm and compares the expected reward, unknown 
to the algorithm during the sampling, to the expected reward 
of the best arm. 

Figure [T]b shows the regret vs. the number of samples, 
averaged over 10000 experiments for randomly generated 
instances of 32 arms. Either ^-greedy or UCBy: dominate 
UCB over the whole range. For larger number of samples, 
the advantage of UCB^ over | -greedy becomes more sig- 
nificant. 



| root f rT~~ 



a. search tree, with best arm shaded 





b. regret vs. number of samples 
Figure 1: Simple regret in MAB 



Monte Carlo tree search 

The second set of experiments was performed on randomly 
generated 2-level max-max trees crafted so as to deliber- 
ately deceive uniform sampling (Figure |2]a), necessitating 
an adaptive sampling scheme, such as UCT. That is due to 
the switch nodes, each with 2 children with anti-symmetric 
values, which would cause a uniform sampling scheme to 
incorrectly give them all a value of 0.5. 

Simple regret vs. the number of samples are shown 
for trees with root degree 16 (Figure |2]b) and 64 (Fig- 
ure |2]c). The exploration factor c is set to 2, the default 
value for rewards in the range [0,1]. The algorithms ex- 
hibit a similar relative performance: either ^ -greedy +UCT 
or UCB^+UCT result in the lowest regret, UCB^+UCT 
dominates UCT everywhere except when the number of 
samples is small. The advantage of both ^ -greedy +UCT and 
UCB^+UCT grows with the number of arms. 

The sailing domain 

Figures [5}|5] show results of experiments on the sailing do- 
main. Figure [3] shows the regret vs. the number of sam- 
ples, computed for a range of values of c. Figure [3] a 
shows the median cost, and Figure [5]b — the minimum 
costs. UCT is always worse than either \ -greedy +UCT or 
UCB^+UCT, and is sensitive to the value of c: the me- 
dian cost is much higher than the minimum cost for UCT. 




a. search tree, with path to the best arm shaded 
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Figure 2: MCTS in random trees 



For both \ -greedy +UCT and UCB^+UCT, the difference 
is significantly less prominent. 

Figure [4] shows the regret vs. the exploration factor for 
different numbers of samples. UCB^+UCT is always better 
than UCT, and ^-greedy+UCT is better than UCT expect for 
a small range of values of the exploration factor. 

Figure[5]shows the cost vs. the exploration factor for lakes 
of different sizes. The relative difference between the sam- 
pling schemes becomes more prominent when the lake size 
increases. 

VOI-aware MCTS 

Finally, the VOI-aware sampling scheme was empirically 
compared to other sampling schemes (UCT, | -greedy +UCT, 
UCT^+UCT). Again, the experiments were performed on 
randomly generated trees with structure shown in Figure[2 a. 
Figure [6] shows the results for 32 arms. VOI+UCT, the 
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Figure 3: The sailing domain, 6x6 lake, cost vs. samples 
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Figure 5: The sailing domain, 397 rollouts, cost vs. factor 
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Figure 4: The sailing domain, 6x6 lake, cost vs. factor 



scheme based on a VOI estimate, outperforms all other sam- 
pling schemes in this example. Similar performance im- 
provements (not shown) also occur for the sailing domain. 




Figure 6: MCTS in random trees, including VOI+UCT. 



Conclusion and Future Work 

UCT-based Monte-Carlo tree search has been shown to be 
very effective for finding good actions in both MDPs and 
adversarial games. Further improvement of the sampling 
scheme is thus of interest in numerous search applications. 
We argue that although UCT is already very efficient, one 
can do better if the sampling scheme is considered from a 
metareasoning perspective of value of information (VOI). 

The MCTS SR+CR scheme presented in the paper dif- 
fers from UCT mainly in the first step of the rollout, when 
we attempt to minimize the 'simple' selection regret rather 
than the cumulative regret. Both the theoretical analysis and 
the empirical evaluation provide evidence for better general 
performance of the proposed scheme. 

Although SR+CR is inspired by the notion of VOI, the 
VOI is used there implicitly in the analysis of the algo- 
rithm, rather than computed or learned explicitly in order 



to plan the rollouts. Ideally, using VOI to control sampling 
ab-initio should do even better, but the theory for doing that 
is still not up to speed. Instead we suggest a "VOI-aware" 
sampling scheme based on crude probability and value esti- 
mates, which despite its simplicity already shows a marked 
improvement in minimizing regret. However, application of 
the theory of rational metareasoning to Monte Carlo Tree 
Search is an open problem ( |Hay and Russell 201 and both 
a solid theoretical model and empirically efficient VOI esti- 
mates need to be developed. 

Finding a better sampling scheme for non-root nodes, as 
well as the root node, should also be possible. Although cu- 
mulative regret does reasonably well there, it is far from 
optimal, as meta-reasoning principles imply that an optimal 
scheme for these nodes must be asymmetrical (e.g. it is not 
helpful to find out that the value of the current best action is 
even better than previously believed). 

Finally, applying VOI methods in complex deployed ap- 
plications that already use MCTS is a challenge that should 
be addressed in future work. In particular, UCT is ex- 



tremely successful in Computer Go ( Gelly and Wang 2006 
Braudis and Loup Gailly 2011 Enzenberger and Miiller 



2009 ), and the proposed scheme should be evaluated on this 
domain. This is non-trivial, since Go programs typically use 
"non-pure" versions of UCT, extended with domain-specific 
knowledge. For example, Pachi (Braudis and Loup Gailly 
201 1 1 typically re-uses information from rollouts generated 



for earlier moves, thereby violating our underlying assump- 
tion that information is only used for selecting the current 
move. In early experiments not shown here (disallowing re- 
use of samples, admittedly not really a fair comparison) the 
VOI-aware scheme apears to dominate UCT. Nevertheless, 
it should also be possible to adapt the VOI-aware schemes to 
take into account expected re-use of samples, another topic 
for future research. 
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